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We study the efficiency, precision and accuracy of all-electron variational and diffusion quantum 
Monte Carlo calculations using Slater basis sets. Starting from wave functions generated by Hartree- 
Fock and density functional theory, we describe an algorithm to enforce the electron-nucleus cusp 
condition by linear projection. For the 55 molecules in the G2 set, the diffusion quantum Monte Carlo 
calculations recovers an average of 95% of the correlation energy and reproduces bond energies to a 
mean absolute deviation of 3.2 kcal/mol. Comparing the individual total energies with essentially 
exact values, we investigate the error cancellation in atomization and chemical reaction path energies, 
giving additional insight into the sizes of nodal surface errors. 

PACS numbers: 02.70.Ss, 71.15.Nc, 31.15.-p, 



I. INTRODUCTION 

Ab initio variational Monte Carlo (VMC) and diffu- 
sion Monte Carlo (DMC) methods have been used suc- 
cessfully for systems containing hundreds and sometimes 
thousands of electrons^ Such calculations typically re- 
trieve 95% or more of the correlation energy within the 
fixed-node approximation based on a single determinant. 
Testing the accuracy of VMC and DMC results against 
experiment and other theoretical methods plays an im- 
portant role in the development of computer codes such 
as the CASINO packaged 

As a method that directly competes for accuracy and 
efficiency with deterministic quantum chemical methods, 
benchmark results are of great interest. Total electronic 
energies of atoms lend themselves to direct comparison 
as highly accurate reference values are available. DMC 
calculations can recover 99% or more of the correla- 
tion energy for first row atoms using multi-determinant 
and backflow wave functions. 3 Another well-known set of 
benchmark data are the atomization energies of the G2 
set of moleculesP These energies have been reproduced in 
DMC to high accuracy using pseudopotentialsP Calcula- 
tions of a selection of small molecules consisting of first- 
row atoms have proven all-electron DMC to be nearly as 
accurate as CCSD(T)/cc-pVTZP 

In every case, the dominant deviations were attributed 
to the fixed node approximation made in the DMC 
method. DMC satisfies the variational principle and 
therefore the energies are too high when an approxi- 
mate nodal surface is used. Comparing total energies 
for benchmarking purposes, DMC typically performs ex- 
tremely well. For atomization and chemical reaction en- 
ergies, however, error cancellation in the energy differ- 
ences becomes essential. Unfortunately, the quality of 
the nodal surface turns out to depend intricately on the 
chemical structure of each molecule or atom and the er- 
ror cancelation in DMC is less efficient and predictable 
than in competing methods. 

In this paper we present benchmark results for the 



aforementioned 55 molecules of the G2 set from all- 
electron DMC calculations using wave functions based on 
Slater-type orbitals. These results are directly compara- 
ble to pscudopotential-based calculations. Beyond this, 
however, the availability of highly accurate reference data 
for the total energies of the same set of molecules^ allows 
a deeper analysis of the relative nodal surface errors in 
various chemical species and permits an intuitive visual- 
ization of error cancelation in chemical reaction energies. 
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FIG. 1: Effect of the nuclear cusp constraint on orbitals ty(r) 
(left) and orbital local energies E\ oc — ^~ 1 H'^ (right) for the 
carbon atom. Three different general-purpose basis sets from 
the ADF package were used (top to bottom with increasing 
size and precision). For atoms, only s orbitals require cusp 
correction and are here shown before and after the constraint 
for comparison. Within the single-^ basis SZ the wave func- 
tion is severely distorted when the coefficient of the single 
ls basis function is adjusted; for the double-^ basis DZ, the 
local energy is still strongly distorted; for the quadruple-^ ba- 
sis QZ4P, the divergence in the local energy at the nucleus 
is cleanly removed, otherwise preserving the orbitals and the 
local energy. 
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FIG. 2: Comparison of atomic unrestricted HF (UHF) ener- 
gies. The reference energy^ is based on a cc-pV5Z-h GTO 
basis set and assumed to be exact within the published pre- 
cision of 0.1 kcal/mol. 
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FIG. 3: Atomic total energies computed within HF 
(ADF/QZ4P) and QMC compared with the exact results! 10 
The VMC run with the HF wave function reproduces the HF 
energy to within statistical error bars (demonstrating the neg- 
ligible effect of the cusp correction on the energy for an accu- 
rate basis set). Optimized Slater- Jastrow (SJ) wave functions 
recover roughly 60-85% of the correlation energy within VMC 
and 90-95% within DMC. The results agree very well with an 
earlier study based on numerical atomic orbitals (NAO) on a 
radial gridP 



TABLE I: Timing of VMC steps for the C 2 H 6 molecule. The 
STO/TZP basis set is of comparable precision to GTO and 
significantly more efficient to evaluate. For equal size of basis 
set, GTO without cusp correction performs as well as STO. 
The two types of cusp correction implemented in CASINO 
each add some computational overhead. The general purpose 
cusp correction (gpcc^P is a scheme that adds a correction 
function to each molecular orbital of arbitrary type, while 
the Gaussian cusp correction^ replaces the orbital close to 
nuclei by the exponential of a polynomial. For larger systems 
the computational cost is expected to grow linearly with the 
basis set size, promising up to a 45% performance gain for 
equivalent precision. 



II. SLATER-TYPE ORBITALS 

Slater-type orbitals (STO) were an important tool in 
quantum mechanics long before the availability of com- 
putational tools in physics and chemistry. 9 Inspired by 
the analytic solution of the hydrogen atom, STO basis 
sets were the first choice for a number of important ap- 
proximate studies in the early years of quantum chem- 
istry. With the arrival of computers, however, it turned 
out that Gaussian basis functions (GTO) lf) allow a far 
simpler efficient implementation due to the possibility of 
factorizing Gaussian functions in Cartesian coordinates 
and the simplicity of evaluating multi-center integrals. 
They became the standard in quantum chemistry to the 
point that chemists typically discuss "basis sets" , implic- 
itly referring to a GTO basis. 



While the relative merits of the GTO and STO rep- 
resentations of orbitals in quantum chemistry are still 
under debatep^ attempts have been made to reduce the 
computational effort of working with STO basis setspH^ 
leading to the development of several STO-based elec- 
tronic structure codesP^EH Of these, the only code ac- 
tive development and available for general use is ADF, 19 
offering a full state-of-the-art implementation of Hartree- 
Fock (HF) and density functional theory (DFT) elec- 
tronic structure calculations for molecules and, via its 
sister program BAND, for periodic systems. 

Quantum Monte Carlo (QMC) methods have very 
different computational requirements from conventional 
non-stochastic electronic structure methods. Without 
the need to perform analytic integrations, the usual ad- 
vantages of GTO become irrelevant. Instead, the bulk 
of the computational cost lies in evaluating the trial 
wave function and its derivatives at arbitrary positions 
in space. A basis set that achieves the same precision 
with a more compact representation (fewer basis func- 
tions) will gain a clear advantage. Furthermore, STO 
wave functions allow an exact treatment of the Kato cusp 
condition at nucleP^ and do not suffer from divergent lo- 
cal energies at large distance from a molecule. For these 
reasons, STO basis sets have often been used in QMC. 

In the past, the main disadvantage of STO basis sets 
in QMC calculations has stemmed from the use of a trial 
wave function commonly generated from a preliminary 
HF or DFT calculation; almost all suitable mainstream 
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FIG. 4: Comparison of basis set errors in the HF energies for various STO basis sets. The QZ4P basis is used as a reference. 
Judging by Fig. [5] the remaining error is less than 1 kcal/mol for each second-row atom. The GTO 6-311G basis is roughly 
equivalent in precision to the STO TZP basis set which typically contains about 45% more basis functions. Geometries and 
GTO reference data were the same as those used in Ref. [7] (see EPAPS materiaP^). The dashed line separates the molecules 
containing only first-row atoms from those also containing second-row atoms. 



local basis set electronic structure codes for finite systems 
- particularly in the quantum chemistry community - 
use GTO. To exploit the advantages of STO for QMC 
calculations, one could therefore use either a conversion 
step0 or optimize the orbitals directly within VMC.^ 
The advent of the ADF code has allowed the generation 
of HF or DFT orbitals directly in an STO basis, which 
can then be used in QMC calculations. 

GTO basis sets have been available in the CASINO 
program for over a decade. We have now implemented 
the additional capability to evaluate orbitals expanded in 
a STO basis, allowing the use of trial wave functions gen- 
erated by ADF in VMC and DMC calculations!^ In the 
following, we present details of the cusp constraint used 
in the converter and demonstrate the precision achievable 
with the combination of ADF and CASINO. 

Each molecular orbital is expanded in the STO basis 
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with iVbas basis functions of the form 

&(r) = F^^y^e-^, 



(1) 



(2) 



with Q > and the Y" ; m are the Laplace spherical har- 
monics. Abandoning orthogonality in favour of simplic- 
ity, the Laguerre polynomials present in the analytic so- 
lutions for the hydrogen atom are replaced by r n . 

The centers Ri of the basis functions usually coincide 
with the positions Ri of the AT nuc point-like nuclei I of 
charge Zj. In principle, a single value of C would al- 
low the construction of a complete basis set by including 
sufficiently high orders n. In practice, however, using 
a small number of different £ values is a more efficient 



means of improving the precision of the basis set. A 
further improvement in precision can be made by includ- 
ing high-angular-momentum basis functions to improve 
the description of polarization. In this work, we used 
four general-purpose basis sets from the ADF package, 
in increasing size and precision: single-^ (SZ), double-^ 
(DZ), triple-^-polarized (TZP) and quadruple- £-f our fold- 
polarized (QZ4P). 



III. CUSP CONSTRAINT 

The exact wave function of particles interacting via 
a Coulomb interaction fulfills the Kato cusp condition 
whenever two point-like particles coalesce.^ For pairs of 
electrons, this condition gives rise to dynamic correla- 
tions that can be very efficiently represented by a Jas- 
trow factorJ^SH!!] j n a ll_ e lectron calculations, each single- 
electron orbital ^> should fulfill the cusp condition 



dr 



*(r) 



= -Z/*(R/) 



(3) 



r=Rj 



in the vicinity of each point-like nucleus I of charge Zj 
at position R/, where (-) n denotes the spherical average 
around the nucleus. 

In methods such as DFT or HF, having the exact cusp 
at the nucleus is less important than the overall quality of 
the wave function, so smooth Gaussian functions can be 
used to represent the wave function. QMC, on the other 
hand, is based on evaluating the local energy which di- 
verges at coalescences if the cusp condition is not exactly 
fulfilled. 7 When using GTO-based trial wave functions in 
QMC, the cusp condition is typically enforced artificially, 
either by modifying the GTO basis function*^ or by 
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FIG. 5: Comparison of various calculated bond energies from ab initio computations. The deviation A_Ebond = Sb on d — E^°f 
from the experimental reference energy given in Ref. [24] (including zero-point motion, relativistic and spin-orbit corrections) is 
shown. Our GTO-based DMC calculations were performed with deliberately less effort than the STO-based ones. GrossmarP 
chose a DMC approach similar to ours but using pseudopotentials whereas we have used an all-electron approach. The excellent 
values obtained by Feller et a/!™ are based on fixed-core coupled cluster computations with a careful choice of basis set for each 
molecule including core- valence corrections. 



directly con structing a correction to the single-electron 
orbitals.^ 

In contrast to smooth GTO basis functions, STO- 
based orbitals [Eq. ([2])] are able to fulfill the cusp condi- 
tion [Eq. (j3|] exactly, leading to one linear constraint per 
nucleus / on the coefficients Cj of any molecular orbital 
vf/. These N nuc constraints can be expressed as a single 
matrix equation 

Ex* c < = °> ( 4 ) 

i 

where the N nuc x A^as elements of the constraint matrix 
are given by 

Xi = S^RjSi^olSmfi (C» - Zi) - 5 nt ,i] — 
-Z i (1-S Ri m i )^ (Ri-B,). 

In principle, these linear constraints can be enforced dur- 
ing a HF computationj^but this is rarely implemented in 
electronic structure codes. Instead a wave function origi- 
nating from a code such as ADF can be cusp-corrected by 
a linear projection. To restrict the effect of the cusp cor- 
rection to the vicinity of the nuclei, we fix all coefficients 
except those of the narrowest Is basis function on each 
nucleus, and adjust the coefficients of the N nuc remaining 
orbitals to fulfill Eq. Q. As the cusp conditions on dif- 
ferent nuclei are nearly independent, this linear problem 
is always well-conditioned and has a unique solution. 



Fig. [T] demonstrates the effect of this cusp correction 
scheme on the orbitals and local energies. The single- 
ts basis set clearly does not leave sufficient freedom for 
the correction and the single Is basis function is severely 
distorted. For larger basis sets, however, the singular- 
ity in the local energy is cleanly removed with negligible 
distortion of the orbitals away from the cusp. 



IV. ATOMIC TOTAL ENERGIES 

We computed the HF energies of the first- and second- 
row atoms with ADF (see Fig. [2]) to compare the qual- 
ity of available basis sets. We found the ADF/QZ4P 
basis-set error to be below 0.1 kcal/mol for the first row 
atoms and below 1 kcal/mol for the second row atoms. 
The corresponding DMC energies are expected to be less 
sensitive to errors in the orbital basis than HF energies. 
For comparison, GTO-based HF energies were computed 
with the CRYSTAL program,^ using a 6-311G basis set, 
also displayed in Fig. [2] 

To evaluate the combined approach of using ADF and 
CASINO, the total energies of the first- and second-row 
atoms including correlation effects are compared with 
exact reference values in Fig. [3] as well as with earlier 
QMC data using wave functions defined on a radial grid. 2 
A VMC calculation using the Slater-determinant wave 
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FIG. 6: Visualization of the error cancellation in bond energies. The horizontal bars on the left and right of each reaction 



correspond to the error in the total energy of each species AE to t = E\ 



dmc 
tot 



-E(oti where the reference total energies are based 



on experimental atomization energies, atomic total energies and theoretical corrections P Within the statistical precision, these 
errors are due almost entirely to the fixed node approximation. The vertical extent of each bar is the error per electron, which 
is in the range of 1-3 mHa/electron except for H, H2, He, Li, LiH, BeH, and Li2 for which the nodal surface is exact or nearly 
so. The horizontal extent is the number of electrons. The difference in the areas on the right and left side of each reaction 
represents the error in the bond energy, which is shown in the center. 



function reproduced the HF energy, confirming that en- 
forcing the cusp constraint preserves the overall quality 
of the wave function. 

For each atom, we optimized a Jastrow factor^ 
consisting of electron-electron, electron-nucleus and 
electron-electron-nucleus terms. The Jastrow factor was 
expressed as a power expansion in the inter-particle dis- 
tances with the parameters: C = 3, N u = N x = 12, 
N e f n = 2, iVf = 3 (see Eqs. 19, 20, and 21 of Drum- 
mond et alW^}. For the VMC optimization, we mini- 
mized the mean absolute deviation of the energy from 
the mean energy over a set of hOQQ{)/N c \ cc configurations, 



performing five consecutive optimization iterations. This 
conservative choice of parameters allowed a reliable, au- 
tomated optimization procedure. The resulting Slater- 
Jastrow (SJ) wave functions recover 60-85% of the cor- 
relation energy. 

DMC calculations using the optimized SJ wave func- 
tions recovered 90-95% of the correlation energy for Be 
and heavier atoms. (Time step errors^ were eliminated 
by linear extrapolation.) H and He do not have a nodal 
surface, so DMC produces the exact energy. The HF 
nodal surface for Li is extremely good, and DMC recov- 
ers more than 99.5% of the correlation energy in this 
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FIG. 7: The same visualization scheme as introduced in Fig. [6] 
applied to some of the chemical reactions studied in Ref. 1361 
In most cases, the nodal surface of the unsaturated molecule 
on the left is described significantly worse by single determi- 
nant wave function than the saturated molecules on the right, 
leading to a systema tic o verestimation of reaction energies as 
reported previously 

case. The total energies of the first row atoms show 
excellent agreement with previous QMC results!^ This 
indicates that the remaining error is due to the fixed- 
node approximation and presents the limit of the single 
determinant SJ method which can only be bettered by 
improving the nodal surface, e.g. by using backflow^ or 
multi-determinant wave functions!^ 



V. THE G2 SET OF MOLECULES 



atoms. On average, we recovered 82% of the correlation 
energy within VMC. 

The SJ wave functions thus obtained were then used 
as the trial wave functions for DMC. For each system, 
two runs at timesteps dt = 0.01 and dt = 0.001 were 
performed, allowing a linear extrapolation dt — > 0. We 
used a total of 40 CPU hours per electron (Intel XEON, 
3 GHz) for each molecule, reaching a statistical precision 
of about 25 /xhartree per electron. 

For comparison, the bond energies computed using dif- 
ferent methods are shown in Fig. [5j The mean absolute 
deviation (MAD) of our STO-based bond energies from 
the experimental reference values is 3.2 kcal/mol. This is 
slightly larger than the deviation of 2.9 kcal/mol found 
in the pseudopotential-based DMC study of GrossmanHl 
however, those values excluded the relativistic and spin- 
orbit corrections which shift individual reference values 
by up to 2 kcal/mol and would increase the MAD to 
3.1 kcal/mol. Indeed, the MAD between Grossman's 
pseudopotential results and our all-electron results is just 

2.0 kcal/mol, showing the strong correlation between 
the errors in the two sets of results. Our GTO-based 
all-electron DMC calculations, which were deliberately 
performed without time step extrapolation or any kind 
of fine tuning of the basis sets or other computational 
parameters, gave a MAD from the reference values of 

5.1 kcal/mol. This larger MAD shows that time-step 
extrapolations and a careful choice of basis sets is impor- 
tant in obtaining accurate single-determinant SJ DMC 
results. 

Overall the obvious correlations between the errors of 
the three independent DMC-based attempts clearly sug- 
gest that a further systematic improvement can only be 
achieved by going beyond the fixed-node DMC approach 
with single-determinant SJ trial wave functions. This 
confirms the finding of Grossman that the fixed-node ap- 
proximation dominates the remaining error. 



Further benchmarking was performed for the bond en- 
ergies of the 55 molecules of the G2 set P The bond energy 
-E'bond is t ne difference between the molecular and atom- 
ized total energies, not including the zero point motion 
of the nucleP or any further corrections PSl 

First of all we used the G2 set to analyze the quality 
of the various basis sets by comparing the molecular HF 
energies (see Fig. Based on this data and the atomic 
data, we chose the QZ4P basis set for all further work as 
it systematically gives the smallest basis set error. 

To enable a direct comparison of results, the HF calcu- 
lations (Fig. [4]) were performed using the same molecular 
geometries as the previous GTO calculations reported in 
Ref. [7l For all other computations we used the more 
precise geometries from Ref. 25 where available, and for 
the remaining molecules we used those from Ref. [6 which 
were provided by the authors.^ The geometry of SiH2 in 
the triplet state was obtained from Ref. l38l 

For each molecule, we optimized a Jastrow factor with 
the same set of terms and parameters as used for the 



VI. ERROR CANCELLATION 

The variational principle guarantees that the fixed 
node approximation leads to an overestimate of the total 
energy for each molecule and atom. The bond energy, 
being a difference of total energies, therefore shows sig- 
nificant error cancellation. Studying the bond energies 
in Fig. [5] reveals only very limited systematics about the 
sign and the magnitude of the errors in the bond energies. 
The picture becomes much clearer when we directly com- 
pare the errors in the total energies for the molecules and 
their constituent atoms. We found that the nodal surface 
error lies in the range of 1-3 mHa/electron for each atom 
and molecule in our test set, except for a very limited set 
of species for which the nodal surface is exact or nearly 
so. This allows us to visualize the error cancellation in a 
very compact and intuitive manner (Fig. |6| . 

We observe a limited number of cases where both 
molecule and constituent atoms are described well, lead- 
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ing to an accurate bond energy (LiH, Li2). In some cases, 
all species involved show similar nodal errors per electron, 
leading to strong error cancellation (e.g. CO, C0 2 , Na 2 , 
S12). In many other cases, however, the quality of the 
wave function is very different for the various species, 
which may lead to a large net error (e.g. NO, SO2), 
although the errors may also largely cancel (e.g. LiF, 
NaCl). 

For most molecules containing hydrogen and carbon 
(e.g. CH3, CH4), one can observe that the wave func- 
tion of the molecule is described significantly better than 
that of the carbon atoms, leading to a systematic over- 
estimation of the bond energy. (Even though the absence 
of a nodal surface error for the hydrogen atom might in- 
tuitively suggest the opposite.) A similar effect can also 
be seen in the visualization of chemical reaction energies 
(Fig. [7]). Here, one can observe that fully hydrogenated 
molecules are typically described better by the fixed node 
approximation than molecules containing double or triple 
bonds. 

Finally, one can observe that second row atoms and 
their molecules typically show significantly larger nodal 
surface errors than first row atoms. The errors in bond 
energy, however, are not necessarily larger, indicating 
that these nodal surface errors arise mainly from the core 
electrons and that they cancel in energy differences. 

VII. CONCLUSIONS 

To conclude, we have demonstrated the accuracy of 
STO trial wave functions generated by the ADF soft- 
ware packages in VMC and DMC calculations using the 
CASINO program. Using the QZ4P basis set from ADF, 
the basis set errors are below 0.1 kcal/mol for first-row 
atoms and below 1 kcal/mol for second-row atoms. DMC 
calculations for the G2 set of molecules recovered on av- 



erage 95% of the correlation energy. Due to partial er- 
ror cancellation, the atomization energies could be repro- 
duced to a mean absolute deviation from the experimen- 
tal values of 3.2 kcal/mol. 

The errors in the total energies of individual molecules 
and atoms, which originate - apart from statistical er- 
rors - almost entirely from the fixed node approximation, 
were then used to analyse and visualize the error can- 
cellation in atomization and chemical reaction energies. 
While we find that the nodal error from the core electrons 
in the second row atoms largely cancels out, other error 
cancellations seem more coincidental than systematic. 

As our bond energies are of similar quality to those 
obtained previously in pseudopotential calculations, we 
may assume that we have reached the limit in accuracy 
that is possible with a nodal surface obtained from single 
Slater determinant wave functions. Systematic studies of 
the nodal surface of multideterminant wave functions^ 
indicate that significant improvements can be achieved 
with reasonable effort. Using backflow functional or 
geminal wave functional should also lead to higher ac- 
curacy QMC results while retaining its excellent perfor- 
mance and scaling behavior. 

The molecular geometries and the full set of results are 
provided in electronic form, available from EPAPSpS 
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